SPID4.5: A Selective Pseudo Iterative Deletion Discretization Algorithm for Machine Learning, Uncertain Reasoning & Pattern Recognition

نویسندگان

  • Somnath Pal
  • Himika Biswas
  • Debasis Chakraborty
  • Ananda Mohan Ghosh
چکیده

Many machine learning algorithms developed for classification, prediction and uncertain reasoning cannot handle continuous features. To use them on real world data sets, continuous attributes must be discretized into small number of distinct ranges. Also discretization provides an insight into critical values in continuous attributes. In this work (SPID4.5), an improvement of our previously published work SPID3, we iteratively compute pseudo deletion count at each boundary point of all continuous attributes and accept threshold points which reduce noise in the database most. The successive reduction of noise in the database results in a better discretization. Our extensive empirical experiments with real world datasets show that the state of the art algorithms for learning like CN2, C4.5, RISE and Naïve-Bayes give improvements in performance with SPID4.5 discretization and comparison with other supervised learning methods like MDLP and Chi-Merge show that SPID 4.5 achieves a highly competitive performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovery of the Latent Supportive Relevance in Medical Data Mining

The purpose of a classification learning algorithm is to accurately and efficiently map an input instance to an output class label, according to a set of labeled instances. Decision tree is such a method that most widely used and practical for inductive inference in the data mining and machine learning discipline (Han & Kamber, 2000). However, many decision tree learning algorithms degrade thei...

متن کامل

A global optimal algorithm for class-dependent discretization of continuous data

This paper presents a new method to convert continuous variables into discrete variables for inductive machine learning. The method can be applied to pattern classification problems in machine learning and data mining. The discretization process is formulated as an optimization problem. We first use the normalized mutual information that measures the interdependence between the class labels and...

متن کامل

Integrating Deep Learning Based Perception with Probabilistic Logic via Frequent Pattern Mining

The bridging of the gap between 1) subsymbolic pattern recognition and learning algorithms and 2) symbolic reasoning algorithms, has been a major issue for AI since the early days of the field. One class of approaches involves integrating subsymbolic and symbolic systems, but this raises the question of how to effectively translate between the very different languages involved. In the approach ...

متن کامل

A PRELIMINARY STUDY ON SELECTING THE OPTIMAL CUT POINTS IN DISCRETIZATION BY EVOLUTIONARY ALGORITHMS Salvador García, Victoria López, Julián Luengo, Cristóbal J. Carmona and Francisco Herrera 211 OPTIMIZED ALGORITHM FOR LEARNING BAYESIAN NETWORK SUPER-STRUCTURES

In classification problems, the dissimilarity representation has shown to be more robust than using the feature space. In order to build the dissimilarity space, a representation set of r objects is used. Several methods have been proposed for the selection of a suitable representation set that maximizes the classification performance. A recurring and crucial challenge in pattern recognition an...

متن کامل

Inductive Learning Using Generalized Distance Measures

1 This paper brieey reviews the two currently dominant paradigms in machine learning-the connectionist network (CN) models and symbol processing (SP) systems; argues for the centrality of knowledge representation frameworks in learning; examines a range of representations in increasing order of complexity and measures of similarity or distance that are appropriate for each of them; introduces t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004